{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "252b9f42-8736-411b-9f93-9bc716118349",
   "metadata": {},
   "source": [
    "# Scikit-Learn Permutation Importance with Categorical Features"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3136cce-b15f-4764-a9a2-82739d6e6670",
   "metadata": {},
   "source": [
    "Suppose you have a categorical variable with many unique entries. One-hot encoding breaks up categorical features into new, binary features. However, for certain analyses such as permutation importance, it is often desirable to look at the importance of the original categorical feature rather than the newly created binary features.\n",
    "\n",
    "In this notebook, we are looking at an example to see how we can deal with categorical features in a permutation importance context.\n",
    "\n",
    "In particular, we will see that pandas DataFrames and scikit-learn's [make_column_selector](https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html) are a nice combo that allows us to analyze categorical features just the way we want -- without breaking it up into individual one-hot encoded variables."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae932a15-6d18-4381-9e39-426ec88efbe6",
   "metadata": {},
   "source": [
    "## Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0664201d-ff49-46a8-97b9-fcfc8a717b94",
   "metadata": {},
   "source": [
    "Suppose we have the following dataset with 2 numerical features and 1 categorical features:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "04599cc8-af93-4cfa-933d-ac0e97164b0c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>feat1</th>\n",
       "      <th>feat2</th>\n",
       "      <th>feat3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   feat1  feat2 feat3\n",
       "0    0.0    5.0     A\n",
       "1    1.0    6.0     B\n",
       "2    2.0    7.0     B\n",
       "3    3.0    8.0     C\n",
       "4    4.0    9.0     A"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "\n",
    "d = {'feat1': np.arange(5.), \n",
    "     'feat2': np.arange(5., 10.), \n",
    "     'feat3': ['A', 'B', 'B', 'C', 'A']}\n",
    "\n",
    "y_train = np.arange(5)\n",
    "X_train = pd.DataFrame(d)\n",
    "X_train['feat3'] = X_train['feat3'].astype('category')\n",
    "\n",
    "\n",
    "X_train"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c848a97-e3e6-4401-a726-06dd67305bf4",
   "metadata": {},
   "source": [
    "Let's use pandas' new `category` type for the categorical features. In the next section, you will see why ;)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "90933b57-dfb2-44b4-b6e0-b444af370545",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "feat1     float64\n",
       "feat2     float64\n",
       "feat3    category\n",
       "dtype: object"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1f5da2a-d975-4282-badd-8c2db9e43923",
   "metadata": {},
   "source": [
    "Also, let's create a test set partition for the permutation analysis later:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8ebbb95a-4296-4537-a333-ef8c5234debb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>feat1</th>\n",
       "      <th>feat2</th>\n",
       "      <th>feat3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   feat1  feat2 feat3\n",
       "0    0.0    4.0     A\n",
       "1    1.0    5.0     B\n",
       "2    2.0    6.0     B\n",
       "3    3.0    7.0     C"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d2 = {'feat1': np.arange(4.), \n",
    "      'feat2': np.arange(4., 8.), \n",
    "      'feat3': ['A', 'B', 'B', 'C']}\n",
    "\n",
    "y_test = np.arange(4)\n",
    "X_test = pd.DataFrame(d2)\n",
    "X_test['feat3'] = X_test['feat3'].astype('category')\n",
    "\n",
    "\n",
    "X_test"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fce0d70-0d5e-4e37-8669-5c3986726697",
   "metadata": {},
   "source": [
    "## Independent One-Hot Encoded Features"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a2afbc7-3b83-40ce-bc1e-85ea14c26f28",
   "metadata": {},
   "source": [
    "In this section, we are looking at the standard approach at first, where the categorical feature gets broken into several binary features.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6037bb9-179e-4318-a504-71df9d4fb47e",
   "metadata": {},
   "source": [
    "First, we define a data transformer that scales the numerical features (a standard procedure recommended for most ML algorithms) and one-hot encode the categorical feature:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "5978a163-3316-497c-af1c-7765b1af5e18",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.compose import make_column_transformer, make_column_selector\n",
    "from sklearn.preprocessing import OneHotEncoder, StandardScaler\n",
    "\n",
    "\n",
    "ct = make_column_transformer(\n",
    "       (StandardScaler(), # scale numerical data\n",
    "        make_column_selector(dtype_include=np.number)),\n",
    "       (OneHotEncoder(), # one-hot encode categorical data\n",
    "        make_column_selector(dtype_include=\"category\")))\n",
    "\n",
    "X_train_t = ct.fit_transform(X_train)\n",
    "X_test_t = ct.transform(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "108df113-75a7-474b-8904-66cff2a50920",
   "metadata": {},
   "source": [
    "Looking at the code above, that's where pandas DataFrames come in handy. Using the `make_column_selector` with specific column types, we can choose which features we pass to which preprocessor. Neat!\n",
    "\n",
    "(Note that NumPy arrays don't support mixed types, so the column transformer approach above would not work with NumPy arrays.)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b56e6c9d-6743-4257-a71d-487094acea3a",
   "metadata": {},
   "source": [
    "Next, let's fit out predictive model based on the transformed data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "56ef796f-f9bc-4ace-9ab8-8bb4f315cef8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "\n",
    "model = LogisticRegression(random_state=0)\n",
    "model.fit(X_train_t, y_train);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba3f4de1-9206-4f93-b4ad-968aaf7d76e1",
   "metadata": {},
   "source": [
    "Finally, let's compute the feature importance and visualize it in a bar plot:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "d23ef8de-573e-475f-8c5d-4abb445ea292",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.inspection import permutation_importance\n",
    "\n",
    "r = permutation_importance(model, X_test_t, y_test,\n",
    "                           n_repeats=200,\n",
    "                           random_state=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "b114cb25-00b6-4c42-bf73-6954c1748f08",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEVCAYAAADtmeJyAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAY1UlEQVR4nO3de5RlZX3m8e9DIxcRaITGKNBclIstAsEG0TiKOgqoiERHAZER0ZYoIpJZysh4ScysaGY06ICSDhIvWYpOgtoql1GiomNYAQSF1qC9WoUWMooKIgyXht/8sXfpmaKq+vRln9N99vez1ll19qVO/d7q0/Wc/e53vztVhSSpvzYbdwGSpPEyCCSp5wwCSeo5g0CSes4gkKSeMwgkqec6C4IkFyb5eZIbZ9meJB9KsiLJ95Ic3FUtkqTZdXlE8DHgyDm2HwXs3T6WAB/psBZJ0iw6C4KquhL41Ry7HAN8ohpXAfOTPLareiRJMxvnOYJdgFsGlle16yRJI7T5GH92Zlg343wXSZbQdB+xzTbbPGW//fbrsi5JmjjXXnvt7VW1YKZt4wyCVcBuA8u7ArfOtGNVLQWWAixevLiuueaa7quTpAmS5KezbRtn19Ay4KR29NBhwJ1VddsY65GkXursiCDJp4HDgZ2SrALeBTwCoKrOBy4BXgCsAO4BTu6qFknS7DoLgqo6fg3bC3hjVz9fkjQcryyWpJ4zCCSp5wwCSeo5g0CSes4gkKSeMwgkqecMAknqOYNAknrOIJCknjMIJKnnDAJJ6jmDQJJ6ziCQpJ4zCCSp5wwCSeo5g0CSes4gkKSeMwgkqecMAknqOYNAknrOIJCknjMIJKnnDAJJ6jmDQJJ6ziCQpJ4zCCSp5wwCSeo5g0CSes4gkKSeMwgkqecMAknqOYNAknrOIJCknjMIJKnnDAJJ6rlOgyDJkUluSrIiyVkzbN8+yReTfDfJ8iQnd1mPJOnhOguCJPOA84CjgEXA8UkWTdvtjcD3q+pA4HDg/Um26KomSdLDdXlEcCiwoqpWVtX9wEXAMdP2KWDbJAEeBfwKWN1hTZKkaboMgl2AWwaWV7XrBp0LPBG4FbgBeHNVPTT9hZIsSXJNkmt+8YtfdFWvJPVSl0GQGdbVtOUjgOuBxwEHAecm2e5h31S1tKoWV9XiBQsWbOg6JanXugyCVcBuA8u70nzyH3QycHE1VgA/BvbrsCZJ0jRdBsHVwN5J9mxPAB8HLJu2z83AcwGSPAbYF1jZYU2SpGk27+qFq2p1ktOAy4F5wIVVtTzJqe3284H3AB9LcgNNV9Lbqur2rmqSJD1cZ0EAUFWXAJdMW3f+wPNbged3WYMkaW5eWSxJPWcQSFLPGQSS1HMGgST1nEEgST1nEEhSzxkEktRzBoEk9ZxBIEk9ZxBIUs8ZBJLUcwaBJPXcUEGQZOsk+3ZdjCRp9NYYBEmOprmL2GXt8kFJpt9XQJK0iRrmiODdNDeivwOgqq4H9uiqIEnSaA0TBKur6s7OK5EkjcUwN6a5MckJwLwkewOnA9/utixJ0qgMc0TwJuBJwH3Ap4A7gTM6rEmSNEJrPCKoqnuAs9uHJGnCDDNq6CtJ5g8s75Dk8k6rkiSNzDBdQztV1R1TC1X1a2DnziqSJI3UMEHwUJKFUwtJdgequ5IkSaM0zKihs4FvJflGu/xMYEl3JUmSRmmYk8WXJTkYOAwI8Jaqur3zyiRJIzHMEQHAlsCv2v0XJaGqruyuLEnSqKwxCJK8D3gFsBx4qF1dgEEgSRNgmCOClwD7VtV9HdciSRqDYUYNrQQe0XUhkqTxGOaI4B7g+iRX0EwzAUBVnd5ZVZKkkRkmCJa1D0nSBBpm+OjHR1GIJGk8hhk1tDfwl8AiYKup9VW1V4d1SZJGZJiTxX8HfARYDTwb+ATwyS6LkiSNzjBBsHVVXQGkqn5aVe8GntNtWZKkURkmCO5NshnwoySnJTmWIWcfTXJkkpuSrEhy1iz7HJ7k+iTLB+YzkiSNyDCjhs4AHklzi8r30HQPnbSmb0oyDzgPeB6wCrg6ybKq+v7APvOBDwNHVtXNSZzeWpJGbJgjgj2q6rdVtaqqTq6qlwIL1/hdcCiwoqpWVtX9wEXAMdP2OQG4uKpuBqiqn69N8ZKk9TdMEPznIddNtwtwy8DyqnbdoH2AHZJ8Pcm1SdZ4pCFJ2rBm7RpKchTwAmCXJB8a2LQdzQiiNckM66bf0GZz4CnAc4GtgX9OclVV/XBaLUto74GwcOEwByOSpGHNdURwK3ANcC9w7cBjGXDEEK+9CthtYHnX9jWn73NZVd3d3uPgSuDA6S9UVUuranFVLV6wYMEQP1qSNKxZjwiq6rtJbgSev45XF18N7J1kT+BnwHE05wQGfQE4N8nmwBbAU4G/XoefJUlaR3OOGqqqB5PsmGSL9oTv0KpqdZLTgMuBecCFVbU8yant9vOr6gdJLgO+R3Ovgwuq6sZ1a4okaV2kau770Cf5G+Bgmi6hu6fWV9UHui1tZosXL65rrrlmHD9akjZZSa6tqsUzbRvmOoJb28dmwLYbsjBJ0vgNM/vonwEk2bZZrN92XpUkaWTWeB1Bkv2TXAfcCCxvx/s/qfvSJEmjMMwFZUuBM6tq96raHfhT4G+7LUuSNCrDBME2VfW1qYWq+jqwTWcVSZJGapiTxSuTvIPf34PgRODH3ZUkSRqlYY4IXgMsAC4GPtc+P7nLoiRJozPMqKFfA6cn2R54qKru6r4sSdKoDDNq6JAkNwDfBW5I8t0kT+m+NEnSKAxzjuCjwBuq6psASZ5Bcx/jA7osTJI0GsOcI7hrKgQAqupbgN1DkjQhhjki+Jd2vqFP09xP4BXA15McDFBV3+mwPklSx4YJgoPar++atv7pNMHwnA1ZkCRptIYZNfTsURQiSRqPNQZBkvnAScAeg/tX1emdVSVJGplhuoYuAa4CbqC5eYwkaYIMEwRbVdWZnVciSRqLYYaPfjLJ65I8Nsmjpx6dVyZJGolhjgjuB/4bcDbNKCHar3t1VZQkaXSGCYIzgSdU1e1dFyNJGr1huoaWA/d0XYgkaTyGOSJ4ELg+ydeA+6ZWOnxUkibDMEHw+fYhSZpAw1xZ/PFRFCJJGo9ZgyDJZ6vq5e29CGr69qpyGmpJmgBzHRG8uf36olEUIkkaj1mDoKpua7/+dHTlSJJGbZjho5KkCWYQSFLPDRUESbZOsm/XxUiSRm+NQZDkaOB64LJ2+aAkyzquS5I0IsMcEbwbOBS4A6Cqrqe5SY0kaQIMEwSrq+rOziuRJI3FMFNM3JjkBGBekr2B04Fvd1uWJGlUhjkieBPwJJoJ5z4F3Amc0WFNkqQRmjMIkswDllXV2VV1SPv4L1V17zAvnuTIJDclWZHkrDn2OyTJg0letpb1S5LW05xBUFUPAvck2X5tX7gNkfOAo4BFwPFJFs2y3/uAy9f2Z0iS1t8w5wjuBW5I8hXg7qmVQ9yP4FBgRVWtBEhyEXAM8P1p+70J+EfgkGGLliRtOMMEwZfbx9raBbhlYHkV8NTBHZLsAhwLPIc5giDJEmAJwMKFC9ehFEnSbLq8H0Fmerlpy+cAb6uqB5OZdv9dDUuBpQCLFy9+2JTYkqR1t8YgSPJjZr4fwV5r+NZVwG4Dy7sCt07bZzFwURsCOwEvSLK6qj6/prokSRvGMF1DiweebwX8B+DRQ3zf1cDeSfYEfgYcB5wwuENV7Tn1PMnHgC8ZApI0Wmu8jqCqfjnw+FlVnUPTp7+m71sNnEYzGugHwGeranmSU5Ocur6FS5I2jGG6hg4eWNyM5ghh22FevKouAS6Ztu78WfZ99TCvKUnasIbpGnr/wPPVwI+Bl3dTjiRp1IYJglOmrgWY0vb7S5ImwDBzDf3DkOskSZugWY8IkuxHM9nc9kn+eGDTdjSjhyRJE2CurqF9gRcB84GjB9bfBbyuw5okSSM0axBU1ReALyR5WlX98whrkiSN0DAni69L8kaabqLfdQlV1Ws6q0qSNDLDnCz+JPAHwBHAN2imiriry6IkSaMzTBA8oareAdzdTkD3QuDJ3ZYlSRqVYYLggfbrHUn2B7YH9uisIknSSA1zjmBpkh2AdwDLgEcB7+y0KknSyAxzP4IL2qffANY09bQkaROzxq6hJI9J8tEkl7bLi5Kc0n1pkqRRGOYcwcdoppJ+XLv8Q+CMjuqRJI3YMEGwU1V9FngIfnefgQc7rUqSNDLDBMHdSXakvV1lksOAOzutSpI0MsOMGjqTZrTQ45P8b2AB8LJOq5Ikjcxcs48urKqbq+o7SZ5FMwldgJuq6oHZvk+StGmZq2vo8wPPP1NVy6vqRkNAkibLXEGQgedePyBJE2quIKhZnkuSJshcJ4sPTPIbmiODrdvntMtVVdt1Xp0kqXNz3Zhm3igLkSSNxzDXEUiSJphBIGm9HX744Rx++OHjLkPryCCQpJ4zCCSp5wwCSeo5g0CSes4gkKSeMwgkqecMAknqOYNAknrOIJCknus0CJIcmeSmJCuSnDXD9lcm+V77+HaSA7usR5L0cJ0FQZJ5wHnAUcAi4Pgki6bt9mPgWVV1APAeYGlX9UiSZtblEcGhwIqqWllV9wMXAccM7lBV366qX7eLVwG7dliPJGkGXQbBLsAtA8ur2nWzOQW4tMN6JEkzmOvGNOsrM6yb8U5nSZ5NEwTPmGX7EmAJwMKFCzdUfZIkuj0iWAXsNrC8K3Dr9J2SHABcABxTVb+c6YWqamlVLa6qxQsWLOikWEnqqy6D4Gpg7yR7JtkCOA5YNrhDkoXAxcCrquqHHdYiSZpFZ11DVbU6yWnA5cA84MKqWp7k1Hb7+cA7gR2BDycBWF1Vi7uqSZL0cF2eI6CqLgEumbbu/IHnrwVe22UNkqS5eWWxtIF4u0ZtqgwCSeo5g0CSes4g2MD63D3Q57ZLmzKDQJJ6ziCQpJ4zCCSp5wwCSeq5Ti8ok7Rx2OOsL3f6+v+28pcj+Tk/ee8LO339vvKIQJJ6ziCQpJ4zCCSp5wwCSeo5g0CSes4gkKSeMwgkqecMAknqOYNAknrOIJCknjMIJKnnDAJJ6jmDQJJ6ziCQpJ4zCCSp57wfgXqh63nywTn5tenyiECSes4gkKSeMwgkqecMAknqOYNAknrOIJCknjMIJKnnDAJJ6jmDQJJ6ziCQpJ7rdIqJJEcCHwTmARdU1XunbU+7/QXAPcCrq+o7XdXT92kGuq5pVG0Hp1nY2PzBCe9d807aaHV2RJBkHnAecBSwCDg+yaJpux0F7N0+lgAf6aoeSdLMuuwaOhRYUVUrq+p+4CLgmGn7HAN8ohpXAfOTPLbDmiRJ06Squnnh5GXAkVX12nb5VcBTq+q0gX2+BLy3qr7VLl8BvK2qrpn2WktojhgA9gVu6qToDWcn4PZxFzEmfW479Lv9tn3jtntVLZhpQ5fnCDLDuumpM8w+VNVSYOmGKGoUklxTVYvHXcc49Lnt0O/22/ZNt+1ddg2tAnYbWN4VuHUd9pEkdajLILga2DvJnkm2AI4Dlk3bZxlwUhqHAXdW1W0d1iRJmqazrqGqWp3kNOBymuGjF1bV8iSnttvPBy6hGTq6gmb46Mld1TNim0w3Vgf63Hbod/tt+yaqs5PFkqRNg1cWS1LPGQSS1HMGgST1nEEwAu2cSr3m70CD+vZ+SPLoJDuPu47ZGAQdSrJXku2qqnr4xn9GkhOSvBKgej4qoZ17a+p57/7fJXl8kpcmORr69X5Ish/NUPljk+w/7npm0rs35KgkOYZm6Ow7kuzYpzBIchRwAfA44Jwk7x5vReOV5PHAl5O8un0vPDSwbeLfE0n2Bb4IHAJcmGRShomvUdv2zwF/V1V/U1U3jrummRgEHUiyI/AWmjf/b4Azkzy6D2GQZC/gL4A/qar/TjP54FFJ9pj0ts9hG+Bg4EXAVUmOTXIQ/P6T8aT+bto/hH8P/EVVnQWcCmyb5MDxVta99t/0T4CPV9VHp63fqBgEHaiqXwKvB/4c+CawJfCfkizowSHxA8C7quprSbYEbgPuBrbuQdtnczNwJXAucDqwD/D+JG9MMh8muqvkROBxVfWpdvndwPOAzyX5qyRbja2yjrX/pvcBKwHaGRYGw3+f8VX3/zMINqAkjxpY/HVV3VFVX6fpH9yC5iiBJAck2W4MJXZmqu1VdQvt7LBVdV9V3Uszf9Tm7X4HjavGcamqO4BPAedU1aU0XQVPBF5L03V27hjL61RVvQP4pyRXJLkUWFZVRwPPoJlV4HVjLbB782im16Gq7k+y+cARwdOS7DK+0n7PINhA2k82Ryd5SZITgNck2brd/E2abqJ7klwJfA141CwvtckZaPuxSY4DXppki4ETpDsCW7W/l88kmXEq3EmVJFV1Mc2n4DcDFwPvqao/BD5Kcz5l4kz9+1fVq4AbgUVVdXa77lbgQ8A2G2NXyfoaaNP7gNVJ3g7N1DttF/EzgTcAG8URkVNMbCDtm/4AmhvwbAU8uap+k2Tzqlrd7nMezaegF1fVDeOrdsOao+2PqKoH2k+8uwCPBt64sZ4w61qSNwB/DfxpVZ3brtts8OTxpEkyr6oebJ//PbBzVT0/ycHAp2neD18da5Edav9vPB94Fc20+x8Bdgb+K3BmVXV/X9cheESwnqaSv32z3wXcAfwQmBomtzrJvCQ7AEcAx05KCAzR9gfaXW+nGTHy+j6EwGyfcKvqwzRhOdVNlkkMgcH2V9WDA0cGJwK3JVkOXAicMWkhMP3fvv2/8VXgrcCdwEk0gwbesrGEAHhEsF7a/8hTJ37mt33BJHk6cBZwaVV9JMkhwHKa80T/d2wFb0Br0faDgScD/9SeP5hISR4JPKKq7myXf/f7aZfntX8UTwIOA86qqt+MqdwNbtj2t88/BFxWVZeMp9oNa01tn2H/qSPlOfcbJYNgA0hyJvAsmm6RpVX1j+1Y+jfTjJjZGXhF2y86UdbQ9nuA+cAJVfVv46uyW0meSDMi6AHg+1V15hz77gI8pqq+M6r6ujZs+wfDoF3eaP4Qrqu1aHsGh49vdO2uKh/r8aAZJ/w1mrHinwEeBF7dbnsS8FfAE8ddp23v7HewL3Ad8B9pAn8lsNcs+2bg+SOAeeOuf5Ttb/ffvP06b1Nv/zq0fd5A2zOKGod9eI5gPbTpfjfN8LDX09xv+UhgaZLXVNXyqnprVf1gnHV2oc9tn5JmqoinA/+jqj5eVT8Hfg28Nslb01xYOGiz9vt2AD4ObNJDiNe2/e0Rweq2/Z9kE27/Orb9wYG2zx950XMZdxJtKg+aM/6bDSxvMfD8scBXgN3b5S8CP6N5o282qhpt+1h+N1sPPD+XZmjok4HPAx8c2Db1aXB++/t6zrhrt/22ferhOYIhJXlUVf22fX4G8HhgJ+BsmlEx7wf+F7ADzZWjH6gJOSfQ57bPpL0+ZJ+q+m6aKRRW03QLPKGqftTuszPwZZqhwre163ag6UJ7T1V9czzVr78+t39S297ZPYsnSZIXA8cApyQ5sX3+QuBfgSVVdVaS79OcNH0mcPyk/CHsc9vn8BjgpDRXU+8HnFzNJ6ofDeyzB3AvzUnEqa6EjwF/uTH+IVhLfW7/ZLZ93IckG/uD5qrYr9Kc/FwInAMsprkq8FJgy2n7bz/umm37SH43b6e5duIDA+tCcx7gmTRX0h497Xu2HXfdtt+2z/Swa2gNkmwL/E+aWUTn0XwSPgT4Lc2Q0AeSvBOgqv58EobETelz22cy7dqJpwFPBZ4GXFlV57Xr9weOBa6rqi9ttMMF10Gf2z/pbbdraA2q6q4kVwDvAv6M5hDvmzRTLe+U5FnAHwPHt/tv9P/ow+pz26cbGAf+R8AjaSYVPCfJj4AlSX4LXEVzVfUFVXXbJAVjn9vfi7aP+5BkU3gAuwP/nuYT8QnAv6M58XMRcBmw/7hrtO0j+V28APgBcArwS+Al7fojaEaD3AwcMe46bb9tX+v2jbuATekBPAVYQfMJeB7NRUHbj7su2z6S9u8J/AuwN/Bimqm27wNe1W7fjgm+eK7P7e9D28dewKb2AA6kGTL5hnHXYttH2vbNgCfQzBN0XbvuROAh4KRx12f7bfv6PDxHsJaqGT98ODARk8etjT61feBK0H2ArWnmkVmR5ADg6+1uP6O54cz/GVOZnelz+/vYdkcNSQOS7E7z/+InSV5Ic2ORbwDPpekWeALN3PI300wnfGJVXbfJnRycRZ/b3+e2e0QgtZLsQXMC/JQ095c9leamIgcBzwFuraoftsMC96G5wcx1MBkjpvrc/j63HTwikIDfTaL3UuCPaC6ceyXN9AEPAi+nuWJ6ZZLnVdVXBr9vIv4Q9Lj9fW77FINAaiWZTzNEcEuaLoDjaT4RPq+aseGHAX9L84dh4u601uf297ntYNeQNOg3NPPL708zZcAHaW4uf3KSLWkunnv7JP4haPW5/X1uu0cE0nTtScMraOaU+VeaqQR2AL5dVVdOUpfATPrc/r623SCQZpDkD2mGB55XVeeOu55R63P7+9h2g0CaRZKDgX+gGT74k0n8JDiXPre/b203CKQ5JNm2qu4adx3j0uf296ntBoE0h0ntEx5Wn9vfp7YbBJLUc5uNuwBJ0ngZBJLUcwaBJPWcQSBJPWcQSFLPGQSS1HP/D9vKetQdKhZbAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plt.bar(range(r['importances_mean'].shape[0]), r['importances_mean'],\n",
    "        yerr=r['importances_std'], align=\"center\")\n",
    "\n",
    "plt.ylabel('Feature importance')\n",
    "\n",
    "plt.xticks(range(5),\n",
    "           ['feat1', 'feat2', 'feat3_A', 'feat3_B', 'feat3_C'], rotation=45)\n",
    "\n",
    "plt.ylim([0, 1.])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "278b0b9a-6e24-4a7f-8e53-52cd3f3dcbc9",
   "metadata": {},
   "source": [
    "As we can see above, each categorical feature is its own variable. However, for several real-world problems (let's say you have a biological dataset and want to know how important it is to include information about the amino acid type -- a categorical variable with 20 unique values), it would be more useful to know the overall importance of the categorical feature. Let's look at this in the next section."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e774687f-6d35-432e-8b5c-fbdccd6a4b00",
   "metadata": {},
   "source": [
    "## Grouped One-hot Encoded Features"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a96f2ad3-3479-493a-97fa-7a07463641bd",
   "metadata": {},
   "source": [
    "instead of breaking up the categorical feature into individual binary features prior to the permutation importance analysis, let's see how we can analyze the categorical feature as a whole.\n",
    "\n",
    "Here, all we have to do is to combine our column transformer and predictive model into a pipeline and use that pipeline in the `permutation_importance` function and pass it the untransformed test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "d0483fd0-78ab-4a83-8217-bb37d1d38729",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.pipeline import make_pipeline \n",
    "\n",
    "model_pipe = make_pipeline(ct, model)\n",
    "\n",
    "r = permutation_importance(model_pipe, X_test, y_test,\n",
    "                           n_repeats=200,\n",
    "                           random_state=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "c288abb0-403a-48e2-bdb3-9c4cd365dd9d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEMCAYAAADJQLEhAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAUpUlEQVR4nO3da7BlZX3n8e+PRgSRm9AmCjRg0gNpFShs0Mk4Y2sq4SKkYzQKxFgSTctEREJSIzMM6sQXIZWYaAqU6UFidCYhZGS0oy1UilGMUSpcgkJrMG0j0qKJKHfk0s1/XqzVzs7hXBbNWXt37/X9VO06e132Pn+e4tSvn2c961mpKiRJw7XLpAuQJE2WQSBJA2cQSNLAGQSSNHAGgSQNnEEgSQPXWxAkuSzJvyS5dY7jSfInSTYm+WqSY/qqRZI0tz57BB8FTpjn+InA8va1Bvhwj7VIkubQWxBU1ReAH85zymrgY9W4Dtg3yfP6qkeSNLtJXiM4ELhzZHtzu0+SNEa7TvB3Z5Z9s653kWQNzfARe+6550uOOOKIPuuSpKlz44033l1VS2c7Nskg2AwcPLJ9EHDXbCdW1VpgLcDKlSvrhhtu6L86SZoiSe6Y69gkh4bWAW9qZw+9DLivqr47wXokaZB66xEk+QtgFXBAks3Ae4BnAFTVJcB64CRgI/AwcEZftUiS5tZbEFTVaQscL+Dtff1+SVI33lksSQNnEEjSwBkEkjRwBoEkDZxBIEkDZxBI0sAZBJI0cAaBJA2cQSBJA2cQSNLAGQSSNHAGgSQNnEEgSQNnEEjSwBkEkjRwBoEkDZxBIEkDZxBI0sAZBJI0cAaBJA2cQSBJA2cQSNLAGQSSNHAGgSQNnEEgSQNnEEjSwBkEkjRwBoEkDZxBIEkDZxBI0sAZBJI0cAaBJA2cQSBJA2cQSNLAGQSSNHC9BkGSE5LclmRjkvNmOb5Pkr9O8pUkG5Kc0Wc9kqQn6y0IkiwBLgZOBFYApyVZMeO0twNfq6qjgFXA+5Ps1ldNkqQn67NHcBywsao2VdVjwOXA6hnnFLBXkgDPBn4IbOmxJknSDH0GwYHAnSPbm9t9oy4Cfga4C7gFeGdVPTHzi5KsSXJDkhu+//3v91WvJA1Sn0GQWfbVjO3jgZuB5wNHAxcl2ftJH6paW1Urq2rl0qVLF7tOSRq0PoNgM3DwyPZBNP/yH3UGcGU1NgK3A0f0WJMkaYY+g+B6YHmSw9oLwKcC62ac823g5wCS/ARwOLCpx5okSTPs2tcXV9WWJGcBVwNLgMuqakOSM9vjlwDvAz6a5BaaoaR3VdXdfdUkSXqy3oIAoKrWA+tn7Ltk5P1dwC/0WYMkaX7eWSxJA2cQSNLAGQSSNHAGgSQNnEEgSQNnEEjSwBkEkjRwBoEkDZxBIEkDZxBI0sAZBJI0cAaBJA1cpyBIskeSw/suRpI0fgsGQZJTaJ4idlW7fXSSmc8VkCTtpLr0CN5L8yD6ewGq6mbg0L4KkiSNV5cg2FJV9/VeiSRpIro8mObWJKcDS5IsB84GvtRvWZKkcenSI3gH8ELgUeDPgfuAc3qsSZI0Rgv2CKrqYeD89iVJmjJdZg39TZJ9R7b3S3J1r1VJGrRVq1axatWqSZcxGF2Ghg6oqnu3bVTVPcBze6tIkjRWXYLgiSTLtm0kOQSo/kqSJI1Tl1lD5wNfTHJtu/0fgDX9lSRJGqcuF4uvSnIM8DIgwG9V1d29VyZJGosuPQKAZwI/bM9fkYSq+kJ/ZUmSxmXBIEjy+8AbgA3AE+3uAgwCSZoCXXoEvwQcXlWP9lyLJGkCuswa2gQ8o+9CJEmT0aVH8DBwc5JraJaZAKCqzu6tKknS2HQJgnXtS5I0hbpMH/2zcRQiSZqMLrOGlgO/B6wAdt+2v6pe0GNdkqQx6XKx+E+BDwNbgFcCHwM+3mdRkqTx6RIEe1TVNUCq6o6qei/wqn7LkiSNS5cgeCTJLsA/JTkryWvouPpokhOS3JZkY5Lz5jhnVZKbk2wYWc9IkjQmXWYNnQM8i+YRle+jGR5600IfSrIEuBj4eWAzcH2SdVX1tZFz9gU+BJxQVd9O4vLWkjRmXXoEh1bVg1W1uarOqKrXAssW/BQcB2ysqk1V9RhwObB6xjmnA1dW1bcBqupfnkrxkqSnr0sQ/OeO+2Y6ELhzZHtzu2/UvwH2S/L5JDcmWbCnIUlaXHMODSU5ETgJODDJn4wc2ptmBtFCMsu+mQ+02RV4CfBzwB7Al5NcV1XfmFHLGtpnICxb1qUzIknqar4ewV3ADcAjwI0jr3XA8R2+ezNw8Mj2Qe13zjznqqp6qH3GwReAo2Z+UVWtraqVVbVy6dKlHX61JKmrOXsEVfWVJLcCv7CddxdfDyxPchjwHeBUmmsCoz4FXJRkV2A34KXAH2/H75Ikbad5Zw1V1dYk+yfZrb3g21lVbUlyFnA1sAS4rKo2JDmzPX5JVX09yVXAV2medXBpVd26ff8pkqTt0WX66B3A3yVZBzy0bWdV/dFCH6yq9cD6GfsumbH9B8AfdKpWkrTougTBXe1rF2CvfsuRJI1bl9VH/xtAkr2azXqw96okSWOz4H0ESV6U5B+AW4EN7Xz/F/ZfmiRpHLrcULYWOLeqDqmqQ4DfBv5Hv2VJksalSxDsWVWf27ZRVZ8H9uytIknSWHW5WLwpyQX8/2cQvBG4vb+SJEnj1KVH8OvAUuBK4P+078/osyhJ0vh0mTV0D3B2kn2AJ6rqgf7LkiSNS5dZQ8cmuQX4CnBLkq8keUn/pUmSxqHLNYKPAL9ZVX8LkOTlNM8xPrLPwiRJ49HlGsED20IAoKq+CDg8JElTokuP4O+T/HfgL2ieJ/AG4PNJjgGoqpt6rE+S1LMuQXB0+/M9M/b/LE0wvGoxC5IkjVeXWUOvHEchkqTJWDAIkuwLvAk4dPT8qjq7t6okSWPTZWhoPXAdcAvNw2MkSVOkSxDsXlXn9l6JJGkiukwf/XiS30jyvCTP2fbqvTJJ0lh06RE8RvMoyfNpZgnR/nxBX0VJksanSxCcC/x0Vd3ddzGSpPHrMjS0AXi470IkSZPRpUewFbg5yeeAR7ftdPqoJE2HLkHwyfYlSZpCXe4s/rNxFCJJmow5gyDJFVX1+vZZBDXzeFW5DLU0pQ497zMT/f3f2/SDHaIOgG9d+OpJl9C7+XoE72x/njyOQiRJkzFnEFTVd9ufd4yvHEnSuHWZPipJmmIGgSQNXKcgSLJHksP7LkaSNH4LBkGSU4Cbgava7aOTrOu5LknSmHTpEbwXOA64F6CqbqZ5SI0kaQp0CYItVXVf75VIkiaiyxITtyY5HViSZDlwNvClfsuSJI1Llx7BO4AX0iw49+fAfcA5PdYkSRqjeYMgyRJgXVWdX1XHtq//WlWPdPnyJCckuS3JxiTnzXPesUm2JnndU6xfkvQ0zRsEVbUVeDjJPk/1i9sQuRg4EVgBnJZkxRzn/T5w9VP9HZKkp6/LNYJHgFuS/A3w0LadHZ5HcBywsao2ASS5HFgNfG3Gee8APgEc27VoSdLi6RIEn2lfT9WBwJ0j25uBl46ekORA4DXAq5gnCJKsAdYALFu2bDtKkSTNpc/nEWS2r5ux/QHgXVW1NZnt9B/XsBZYC7By5conLYktSdp+CwZBktuZ/XkEL1jgo5uBg0e2DwLumnHOSuDyNgQOAE5KsqWqPrlQXZKkxdFlaGjlyPvdgV8BntPhc9cDy5McBnwHOBU4ffSEqjps2/skHwU+bQhI0ngteB9BVf1g5PWdqvoAzZj+Qp/bApxFMxvo68AVVbUhyZlJzny6hUuSFkeXoaFjRjZ3oekh7NXly6tqPbB+xr5L5jj3zV2+U5K0uLoMDb1/5P0W4Hbg9f2UI0katy5B8JZt9wJs0477S5KmQJe1hv53x32SpJ3QnD2CJEfQLDa3T5JfHjm0N83sIUnSFJhvaOhw4GRgX+CUkf0PAL/RY02SpDGaMwiq6lPAp5L826r68hhrkiSNUZeLxf+Q5O00w0Q/HhKqql/vrSpJ0th0uVj8ceAngeOBa2mWinigz6IkSePTJQh+uqouAB5qF6B7NfDifsuSJI1LlyB4vP15b5IXAfsAh/ZWkSRprLpcI1ibZD/gAmAd8Gzg3b1WJUkamy7PI7i0fXstsNDS05KkncyCQ0NJfiLJR5J8tt1ekeQt/ZcmSRqHLtcIPkqzlPTz2+1vAOf0VI8kacy6BMEBVXUF8AT8+DkDW3utSpI0Nl2C4KEk+9M+rjLJy4D7eq1KkjQ2XWYNnUszW+inkvwdsBR4Xa9VSZLGZr7VR5dV1ber6qYkr6BZhC7AbVX1+FyfkyTtXOYbGvrkyPu/rKoNVXWrISBJ02W+IMjIe+8fkKQpNV8Q1BzvJUlTZL6LxUcluZ+mZ7BH+552u6pq796rkyT1br4H0ywZZyGSpMnoch+BJGmKGQSSNHAGgSQNnEEgSQNnEEjSwHVZa0iSxuonT79w0iUMij0CSRo4g0CSBs4gkKSBMwgkaeAMAkkauF6DIMkJSW5LsjHJebMc/9UkX21fX0pyVJ/1SJKerLcgSLIEuBg4EVgBnJZkxYzTbgdeUVVHAu8D1vZVjyRpdn32CI4DNlbVpqp6DLgcWD16QlV9qaruaTevAw7qsR5J0iz6DIIDgTtHtje3++byFuCzPdYjSZpFn3cWZ5Z9sz7pLMkraYLg5XMcXwOsAVi2bNli1SdJot8ewWbg4JHtg4C7Zp6U5EjgUmB1Vf1gti+qqrVVtbKqVi5durSXYiVpqPoMguuB5UkOS7IbcCqwbvSEJMuAK4Ffq6pv9FiLJGkOvQ0NVdWWJGcBVwNLgMuqakOSM9vjlwDvBvYHPpQEYEtVreyrJknSk/W6+mhVrQfWz9h3ycj7twJv7bMGSdL8vLNYkgbOIJCkgTMIJGngDIIBW7VqFatWrZp0GVPD9tTOyiCQpIEzCCRp4AwCSRo4g0CSBs4gkKSBMwgkaeAMAkkaOINAkgbOIJCkgTMIJGngDAJJGjiDQJIGziCQpIEzCCRp4AwCSRo4g0CSBs4gkKSBMwgkaeAMAkkaOINAkgbOIJCkgTMIJGngDAJJGjiDQJIGziCQpIEzCCRp4AwCSRo4g0CSBs4gkKSBMwgkaeAMAkkauF37/PIkJwAfBJYAl1bVhTOOpz1+EvAw8OaquqnPmnYUh573mUmXwPc2/QCYfC3fuvDVT/s7Jv3fANPVnhqW3noESZYAFwMnAiuA05KsmHHaicDy9rUG+HBf9UiSZtfn0NBxwMaq2lRVjwGXA6tnnLMa+Fg1rgP2TfK8HmuSJM2Qqurni5PXASdU1Vvb7V8DXlpVZ42c82ngwqr6Yrt9DfCuqrphxnetoekxABwO3NZL0cN0AHD3pIuYIrbn4rEtF9chVbV0tgN9XiPILPtmpk6Xc6iqtcDaxShK/1qSG6pq5aTrmBa25+KxLcenz6GhzcDBI9sHAXdtxzmSpB71GQTXA8uTHJZkN+BUYN2Mc9YBb0rjZcB9VfXdHmuSJM3Q29BQVW1JchZwNc300cuqakOSM9vjlwDraaaObqSZPnpGX/VoTg65LS7bc/HYlmPS28ViSdLOwTuLJWngDAJJGjiDQJIGziDQv9Ku/6RFZrsuHtty8XmxWAAkeQFwd1XdnyTl/xhPS5KXA8to/sb+16Tr2ZklWQUcATxeVR+ZbDXTyR6BSLKaZprvBUn2r6ryX13bL8mJwKXA84EPJHnvZCvaebVteTGwFbionZKuRWaPYOCS7A98ArgJuAfYHXh/Vf3QnsFT1/as/gr4nar6XJLDaBZcfANwh+3ZXdt2V9K05TVJfoVm9YHrqurLk61uuhgEIsnhwD8DRwMnA48Bf1xV359kXTujJAcDR1XVp5M8k2Y9rfXA26vq65OtbueSZD/guVV1W5IDga/RhOwK4Frg/Kp6YpI1TguHhgYqybNHNu+pqnur6vM0y37sBvxWe96RSfaeQIk7lW3tWVV30q6OW1WPVtUjNOtn7dqed/SkatxZjLTlPTS9VGiG2da0qxn/Yvt642QqnD69PqFMO6YkuwOnJPkR8CxgWZIPVtWPgL+l+Vfsv0/yBeCFwIuB+ydW8A5upD0fAZ4JHJrkj4CtVbUV2B/YPcnpwHuSvNze1uxmtOXuwCFJ/hC4qaquT7JLVd2d5AqanqsWgUEwTI8D/0gzdr078OKq+lGSXatqC3BtktfTrAy7qqpcEXZ+s7XnY0meQXOR85vAfwGeA7zWEJjXzLY8sl23bBeAqnqifbbJLwOvm1yZ08WhoQHZNhOo/VfqA8C9wDeAU9r9W5IsacdmjwdeU1W3TKjcHV6H9ny8PfVu4FjgbVV16/gr3fHN05Ynt/ufSLJnkpOB3wZ+tar+aULlTh0vFg/E6AygJPtW1b3t+58FzgM+W1UfTnIssAGodqhIs3gK7XkMzdDa/22vH2iGp9CWRwMHAjdU1T9Pqt5pZBAMTJJzgVfQdLvXVtUn2rna7wQeAp4LvMHhoG4WaM+HgX2B06vqe5OrcufQoS33o2lLn1myyBwaGpAk/5Fm2OJ0mq73FUneXFWfpeluf5NmZoYh0EGH9txIM23UEFhAx7b8TUOgH14sHoh2DPYhmifFvY3m2dAnAJ9pZ2JcBvynCZa4U7E9F49tOXkODU2h9g8r2262SbJbVT3Wvn8e8DHgrVV1R5K/Bo4BfgZ40Bt0nsz2XDy25Y7JHsF02rOqHgRIcg7wU0kOAM6nmcHyLeC4JMfT3Pz0tqryPoG52Z6Lx7bcAXmNYMok+UXgg+37NwKrgXcB/45m/P9+mlv1XwGcBfyp1wTmZnsuHttyx+XQ0BRpF5D7S5pZFg8A5wL/EziO5kLcL1XVoyPn71NV902i1p2B7bl4bMsdm0EwRZLsRbMo1/3AEpo7NI8FHqSZEvp4kncDVNXvurro/GzPxWNb7tgcGpoiVfUAcA1wEnAd8AGah6NcCRyQ5FSaW/P/qj3fP7R52J6Lx7bcsdkjmDJJDgGWAxcBvwvcSTPeWjQ3N/2Oyxx0Z3suHttyx2UQTKkkL6EZk70AuIKm9/csx123j+25eGzLHY/TR6dUVd2Y5LU03fH9qupDgH9o28n2XDy25Y7HHsGUS/Ii4EdV9c1J1zINbM/FY1vuOAwCSRo4Zw1J0sAZBJI0cAaBJA2cQSBJA2cQSNLAGQSSNHAGgSQN3P8DoU4WM0QdTxYAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plt.bar(range(r['importances_mean'].shape[0]), r['importances_mean'],\n",
    "        yerr=r['importances_std'], align=\"center\")\n",
    "\n",
    "plt.xticks(range(len(X_train.columns)),\n",
    "           np.array(X_train.columns), rotation=45)\n",
    "plt.ylabel('Feature importance')\n",
    "\n",
    "plt.xlim([-1, X_train.shape[1]])\n",
    "plt.ylim([0, 1.])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74b8a1b1-994e-4410-b3d4-9f9433c00a87",
   "metadata": {},
   "source": [
    "As we can see, the categorical feature is now treated as a single feature.\n",
    "\n",
    "So, what happened and how did this work? The permutation importance function permutes features before passing it to the `model_pipe`. So, the `model_pipe` performs the one-hot encoding after we permuted it rather then before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ed13aea-0a44-49ba-b936-55b239f4dae5",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
